-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scheduler overprovision for new actions before namespace throttling #5284
Add scheduler overprovision for new actions before namespace throttling #5284
Conversation
@@ -70,7 +70,17 @@ class SchedulingDecisionMaker( | |||
case _ => Future.successful(DecisionResults(Pausing, 0)) | |||
} | |||
} else { | |||
val capacity = limit - existingContainerCountInNs - inProgressContainerCountInNs | |||
val capacity = if (schedulingConfig.allowOverProvisionBeforeThrottle && totalContainers == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if feature is turned on and the action has no containers yet, give it a capacity of 1 on first decision iteration if total containers in use for the namespace are less than the limit * over provision ratio value, thereafter do the normal capacity calculation.
If this is still over the limit, give 0 capacity and follow the normal code path to turn on namespace throttling
@@ -79,7 +89,7 @@ class SchedulingDecisionMaker( | |||
* | |||
* However, if the container exists(totalContainers != 0), the activation is not treated as a failure and the activation is delivered to the container. | |||
*/ | |||
case Running => | |||
case Running if totalContainers == 0 || !schedulingConfig.allowOverProvisionBeforeThrottle => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If capacity == 0 and feature is turned on and still no containers for action could be created, then turn on namespace throttling. If capacity == 0 and feature is turned off and no containers exist, just skip any considerations and turn on namespace throttling same as existing behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there is a subtle difference in the semantics when the feature is enabled.
Let's say there are one action utilizing 30 containers, and 15 actions with 1 container each.
The totalContainers
is always bigger than 0. Then isn't throttling enabled?
The throttling would be enabled when an activation for another action arrives, and no container is created.
I think it would be better to enable the namespace throttling when the number of containers reaches namespace limit * overprovisoinRatio
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reasoning for this is so that activations can continue processing when the 15th action gets the last 1 container. If the 15th action can handle the throughput at this point it wouldn't want namespace throttling enabled so it can continue processing activations. You wouldn't want to stop activation processing until the 16th action comes in.
Or am I misremembering and namespace throttling only impacts new actions with 0 containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or am I misremembering and namespace throttling only impacts new actions with 0 containers?
This is correct.
The namespace throttling is used when there is no action throttling data in ETCD which means no queue is created for the action.
https://github.com/apache/openwhisk/blob/master/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/FPCPoolBalancer.scala#L683
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I have updated this behavior and added a commit to what you requested in the latest commit.
I think this pr should be good to go except for one remaining issue I just thought of if one of the 15 actions that got the overprovisioning slots and namespace throttling is enabled and one of those actions containers are removed then I'm not sure if namespace throttling will be correctly disabled, I just need to verify that before merging tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay added case to correctly disable namespace throttling if the container that frees up space is a container from the over-provisioning space in the latest commit. Would appreciate approval on this when you have some time @style95
this is now ready for review @style95 |
Codecov Report
@@ Coverage Diff @@
## master #5284 +/- ##
===========================================
+ Coverage 58.25% 76.57% +18.31%
===========================================
Files 240 240
Lines 14383 14391 +8
Branches 614 605 -9
===========================================
+ Hits 8379 11020 +2641
+ Misses 6004 3371 -2633
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor nits.
...duler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/SchedulingDecisionMaker.scala
Show resolved
Hide resolved
...test/scala/org/apache/openwhisk/core/scheduler/queue/test/SchedulingDecisionMakerTests.scala
Outdated
Show resolved
Hide resolved
...test/scala/org/apache/openwhisk/core/scheduler/queue/test/SchedulingDecisionMakerTests.scala
Outdated
Show resolved
Hide resolved
...test/scala/org/apache/openwhisk/core/scheduler/queue/test/SchedulingDecisionMakerTests.scala
Outdated
Show resolved
Hide resolved
…ng (apache#5284) * initial attempt * tests * fix tests * enable throttling when last capacity used in overprovisioning * add case to correctly disable namespace throttling when namespace overprovisioning has space * feedback Co-authored-by: Brendan Doyle <brendand@qualtrics.com>
Description
This is a wip and trying to get feedback if this is an option we could support. This would add two new configs for the scheduler:
If the first config is false, the ratio is not used anywhere. But the purpose of this is to prevent deadlocking on the new scheduler for actions within a namespace due to the new concept of namespace throttling. Since the scheduler can be aggressive with initially over-provisioning, you can get into a scenario where a single action uses up all of the container concurrency of the namespace before another action gets a change to run. What the code change below attempts to do is up the container concurrency limit to the over provision threshold to allow new actions to have a chance to run. This is really needed if one action depends on another. i.e. action a attempts to execute action b, but action a has all of the action concurrency for the namespace. The workload now will never make progress and action a will only ever fail all of its executions. Ultimately the right thing to do is scale out the namespace, but that's manual by a human and can't be done until impact has already occurred and this helps mitigate by slowing throughput rather than total failures.
Example of how it works:
I'll admit this is a little hacky, but I do think we need to account for this throttling case until we are able to support action level container concurrency limits. So any better ideas are welcome.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: